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I. INTRODUCTION 


The purpose of this study is to consider ways 
in which labour force participation rates, obtained 
from the 1961 Census of Canada, cross-classified 
by a set of demographic, social or economic factors, 
can be related functionally to the different levels 
of those factors. It will be concerned with the exam- 
ination and development of statistical models de- 
Signed to represent these relationships and the 
testing of these models. The analysis of the results 
will, therefore, be in no way as detailed, in subject 
matter content, aS would be required in a full em- 
pirical treatment of the variables. 


Before proceeding it may be relevant to consider 
why it was felt that such a study is needed at all. 
Other studies have already successfully examined 
the relationship between demographic, social and 
economic variables and employment characteristics 
of the population using statistical techniques which 
have ‘‘explained’’ a considerable proportion of the 
variation in labour force participation rates between 
different Sub-groups in the population.’ To be justi- 
fied, a study of this kind must therefore either 
(1) throw up evidence of serious weaknesses in the 
methods employed in these studies, or (2) it must 
show how significant improvements can be made in 
the predictive and analytical power of the model 
used. It was the latter condition only which was a 
consideration in the preparation of this study. It 
will be shown that simple additive models in which 
all the independent variables are expressed as 
dummies* and which yield results that are also 
simple to interpret, can be used to explain much 
of the observed differences in the labour force parti- 
cipation rates particularly when the independent 
factors (variables) are few in number.’ It will also 
show how the analysis of variance, discussed in 
some detail on page 12, adds considerably to the 
analytical power of researchers when the data is in 
the form of a completely balanced factorial arrange- 
ment.* This will more often than not be the case 
when the data has been obtained from the census. 
But moving away from the simple type of model to 
ones which impose conditions on the form which the 
dependent variable can take—conditions which are 


1 See Sylvia Ostry, The Female Worker in Canada, 
one of a series of Labour Force Studies in the Census 
Monograph Programme, Ottawa, Queen’s Printer, 1968, 
and Dominion Bureau of Statistics, Special Labour Force 
Studies No. 2, Series B, Women Who Work: Part 2, by 
John D. Allingham and Byron Spencer, Ottawa, Queen’s 
Printer, 1968. 


inherent in the nature of the variable itself—results 
are obtained whichare, in the author’s view, not only 
conceptually more correct but also ‘‘explain’’ more 
of the overall variation in labour force participation 
rates. It will however, also be seen that the in- 
crease in the ‘‘power’’ of the more complicated 
models will not always be such as to justify their 
use in preference to the more simple models. 


The Study is divided into six sections. Fol- 
lowing this introduction, Section II— Quantal Re- 
sponse will briefly discuss how labour force 
participation rates, as a Statistical variable, differ 
from other variables and will show why this is 
worth taking into account in the construction of the 
model. Section III, which will examine the simple 
additive model, will begin with an introduction to 
the methods of analysis of variance and dummy 
variable regression analysis which will also be of 
use in later sections in the study. The reason why 
so much more time and space is devoted to analysis 
of variance than to dummy variable regression is 
found in the author’s view that analysis of variance 
in its widest sense (that is, not confined to the 
simple partitioning of the mean Squares inregression 
analysis) is much neglected in books on econome- 
trics and economic statistics, even though it often 
provides an ideal means of examining the data to 
obtain some notion of the ‘‘contribution’’ that inde- 
pendent variables make, both individually and in 
association with other variables, to the variation in 
the dependent variable. The technique is therefore 
of considerable value without the question of ‘‘test- 
ing hypotheses’’, which is its raison d’étre, neces- 
sarily being considered. On the other hand, problems 
encountered in the use of dummy variables in regres- 
sion analysis applied to economic data are now 
fairly well documented, and only a brief description 
of this approach is given. 


The models developed in Sections IV and V will 
stem from the removal of some of the simplifying 
assumptions implicit in the simple additive model. 
A summary—Section VI—will then conclude the 
study with the exception of a bibliography of method- 
ological studies. 


2 See Section III, page 16. 

3 See Sylvia Ostry, op. cit. and John D. Allingham 
and Byron Spencer, op. cit. 

4 The term completely balanced factorial design 
refers to that arrangement of the data in which an equal 
number of observations are available for all factor/level 
combinations. 


Il. QUANTAL RESPONSE 


It is quite common, in the application of sta- 
tistical methods to problems encountered in many of 
the applied sciences, to think in terms of a stimulus 
and a response.® In biological assay, for example, 
an insect is subjected to a dose of an insecticide 
(stimulus) and it either lives or dies (the response). 
Or a family with a given set of socio-economic 
characteristics may or may not have an automobile. 
In both of these examples the response is said to be 


5 See D.J. Finney, Probit Analysis, A Statistical 
Treatment of the Sigmoid Response Curve, Cambridge 
University Press, 1952, and J. Aitchison and STACY 
Brown, The Lognormal Distribution, Cambridge, 1957. 


‘ 


quantal, that is, it is of the type ‘‘all or nothing’”’ — 
the insect must live ordie and the family must either 
ownor not own an automobile. It should be clear then, 
from the above, that the problem which is to be con- 
sidered in this study is one of how to deal with data 
obtained from a ‘‘quantal response’’ situation—an 
individual can only be either in, or not in, the labour 
force.® 


© For the purpose of this study the question of the 
intensity with which an individual can be in the labour 
force, i.e. whether the person is working, or wishes to 
work for 5 hours or 50 hours a week, will be ignored. 


ee 


Now consider the situation where data relating 
to the labour force status of the population has been 
obtained from a census. In these cases the concept 
of the individual can still be retained but the data 
also reduced to a manageable size by considering 
all persons together who have common social and 
economic characteristics. All persons then, who 
have the same age, sex, marital status, etc., can 
be grouped and treated as a Single observation. 
Obviously such a group of persons will no longer 
have an associated dependent variable of zero or 
one since it is to be expected that some of the 
members of that class will be in the labour force 
and some will not. If Oe is the number in the ith 


class who are in the labour force and Mo; is the 
number who are not, then ae the labour force parti- 
cipation rate of the ith class is defined as 


ni; 
Pees 
1 n 


deere = jel sar ehanel Wx 1 << Il 
li Oi i 


A group of persons can now be thought of as 
being ‘‘subjected’’, say, to a given level of educa- 
tion and the response rate is then the percentage 
of the group who are in the labour force. Similarly 
the income level of the husband may be thought of 
as the stimulus, albeit a negative one, in the case 
of married women: it would be expected that the 
labour force participation rate of a group of married 
women whose husbands had incomes of, say, $5,000 
a year would be different, and on a priori reasoning 
higher, from that of another group for whom the hus- 
band’s income was $10,000 a year. 


Even though controlled experiments cannot be 
carried out, censuses do provide the means by which 
data can be so ordered as to classify the population 
by those socio-economic variables which are known, 
or which are assumed to influence the likelihood of 
individuals entering the labour force. 


What then is so special about this type of 
variable and how does it differ from other variables 
arising out of a quantitative rather than a quantal 
response situation? The answer to this question is 
found in the nature of the variable itself and by 
considering the way in which the variable changes 
in response to changes in the levels of the inde- 
pendent variables. By definition a proportion or 
percentage cannot lie outside the range of 0 to 1, 
or 0 to 100, so that any model which attempts to 
explain a dependent variable in that form should 
also be constrained to yield estimates which lie 
within the appropriate range. It would, furthermore, 
seem illogical to assume that the effect of a change 
in the level of an independent variable, on a vari- 
able which is constrained in this way, should be 
the same at all levels of the other factors being 
examined. Thus it would be reasonable to assume 
that the expected increase in the proportion of 
married women going out to work due to an increase 
in their level of educational attainment from say 
““some secondary’’ to ‘‘some university’’ would be 
different for two groups which had respectively 
10 per cent and 50 per cent labour force participa- 


rates at the lower educational level.’ In other words 
some interaction between the effects of the factors 
would be expected.® 


The data used throughout this study and on 
which alternative models have been tested consists 
of the observed participation rates in 54 cross- 
classifications of married women in urban Ontario, 
obtained from the 1961 Census, categorised by one 
factor (with three levels) which defines the ‘child 
status’ of the family; one factor(also of three levels) 
which defines the wife’s level of educational attain- 
ment; and one factor, income of husband, which has 
six levels on a continuous scale. These factors and 
their levels are Summarised below. 


Factor 
Child status 


Levels 


No children; some children 
under six years of age; no 
children under six years of 
age. 

J OWS H AO) Nye chee ARON Completed elementary or 
less; some high school or 
completed high school; 
some university or obtained 
a degree. 

Income of husband........... Less than $1,000; 

(per annum) $1,000 -$2,999; 

$3 ,000 - $4,999; 

$5,000 - $6,999; 

$7 ,000 - $9,999. 

$10,000 and over. 


It should be noted that, when the data are 
organised in this way, the number of observations 
is defined by the product of the number of levels 
inveach factonigthuss’s was x46 =54eineiordermthar 
the reader can compare subsequent results with the 
original observations, Table 1 contains, for each 
of the 54 factor/level combinations, the number of 
married women in the labour force, the number of 
married women in the population 14 years of age 
and over and the associated labour force partici- 
pation rate. 


The choice of this sub-set of data was dictated 
by three considerations. First, for reasons which 
will become clear later, no zero or 100 per cent 
participation rates were wanted. Secondly, the range 
of participation rates should not be so limited as 
to bring into question the applicability of the tech- 
niques to the analysis of participation rates in 
general. And thirdly, at least two types of inde- 
pendent variables— quantitative and qualitative — 
should be included. In all other respects the sub-set 
was selected from a number of possible sub-sets of 
data which could have satisfied these three condi- 
tions. It must be stressed, however, that these con- 
ditions were only imposed to facilitate the presenta- 
tion of the methods employed: in no way do they 
imply that serious limitations exist in the use of 
the methods when these restrictions are removed. 


7 These different labour force participation rates, 
at the lower of the two educational levels, could exist 
because the other factors—age, husband’s income etc., 
will not always be the same. 

8 The technique of analysis of variance, which can 
be used to detect the presence of significant interaction 
effects, is discussed in Section III, page 12. 


TABLE 1. Population, Labour 


ll - 


Force and Labour Force Participation Rates of Married Women in Urban Ontario 


by Child Status, Income of Husband and Level of Educational Attainment, June, 1961 


Education 
Child status and | 
as cr usband Elementary Secondary University 
= ——a = 
Popu- Anotts iF Partici- pupa Partici- Partici- 
I i pation pu Labour ation Popu- Labour : 
as. Paget | with: Se lation force ee lation force ery 
000 | "000 000 
Children under 6: 
See Hl ie 292 22 7.53 2,807 129 4.60 L132 75 6.63 
oe 823 50 6. 08 6, 032 349 De 19 iis ke rf 92 7. 82 
S eeuiscsy 4,518 458 10.14 16, 666 2,168 3 FG1 1,016 173 17.08 
1,000- 2,999 .... 11,785 | 2,081 | 17.40 | 21.488 | 4/583 | 21°35 622 174.) . 27.97 
, , tees ; ; : ; , 164 27. 86 13 
ROIS Tue OOO Reteaez cavivanpassoasootsesssipitsleaaces sees 723 167 23.10 671 200 29.81 36 36 a5 a 
No children under 6: 
$10,000 and over 539 71 foe Ly 
, c é 4,104 418 10.19 1,234 130 1025 
LO ae ive 125.0) 188 15.04 5, 858 eR 21.03 678 193 28°47 
apes oh ere 5,381 1,303 24.21 12,284 4,265 34.72 584 304 52.05 
eo eee 12,458 4,025 Sao 14, 089 6,426 45.61 472 255 54. 03 
Und $1 tae Bo kes aswc eth os cd vt 4,462 1,701 a9) 19 2,659 1,395 52.46 129 39 63.57 
nder $1, ete eaters 786 302 38.42 571 284 49.74 44 16 36.36 
No children: 
$10,000 and over 386 69 17.88 1, 838 269 14.64 
, : ; 420 85 20.24 
7,000-$9,999 .......... 795 112 14.09 2,614 850 32.02 402 175 43.53 
5,000 - 6,999 .......... 3, 060 745 24.35 6,465 B25 48.34 576 358 62.15 
BOO OM 4999) ess. ccccclisivacesst 9,591 3, 032 31.61 12,968 7,458 Dilek 632 415 65. 66 
OOO = 2 GOO1 tert Seta ae set eee 6 4,791 1,689 35625 3,879 2,230 57.49 239 148 61.92 
RONALD NOM) Olen ct tena c arte svealsersssivieratehoec es 762 297 | 38.98 581 342 58. 86 50 36 72. 00 
ioe eS yO oe et Se 


Source: 1961 Census. 


I, SIMPLE ADDITIVE MODELS 


It would be rare today for any research to break 
entirely new ground and earlier studies of a given 
subject will have undoubtedly provided some clues 
as to the expected relationship between inter- 
dependent variables. The analysis of labour force 
participation rates is no exception in this respect. 
However, it will often be the case that empirical 
research is being undertaken to explore the rela- 
tionship between variables which, in combination, 
have not been examined before. When this is so it 
would seem to be desirable that the first step in the 
analysis should be to determine not only the rela- 
tive importance of the effect which each independent 
variable has on the variation in the dependent vari- 
able, but also whether the effect of each variable 
changes significantly in response to changes in the 
other variables. The next step in the analysis is, 
then, to estimate the magnitude of any significant 
effects. 


To conform to standard terminology, variables, 
in this section, will be called factors, and two types 
of effect or influence have to be considered—the 
main effect of a factor and the interaction effects. 
The main effect of, say, the child status factor is 
defined as the differences between the contributions 
of each of the three levels of this factor to the vari- 
ation in the participation rates. The presence of a 
two-factor interaction arises, for example, when the 
three child status effects are not the same over, 


say, the three levels of the wife’s education. With 
three factors there are, therefore, three two-factor 
interactions. 


If child status is designated as factor A; the 
educational attainment of the wife as factor B and 
the husband’s income as factor C then the three 
two-factor interactions are defined as AB, AC and 
BC and the one three-factor interaction is ABC. 


Now consider a generalised three-factor model 
in which A has r levels, B,; s levels and C, t levels 
which postulates that 


Pa: =€+ a+ B. +y, + (a8), HOY). ok 


(By), + (aBy). 5. te (1) 
where issn SS0 

Rad A. ws: Ss 

Kee 2148 t 


and where a. B. and y, are the true contributions of 
the levels i, j, k of factors A, B and C, to the varia- 
tion in P, and (a8). ., (ay). (By), and (aBy) 5, are 
the true contributions of the three two-factor and one 
three-factor interactions. € is a constant term and 
Cay an error term which, for the purposes of various 
tests, is assumed to be normally distributed with 
mean zero and variance o°. 


ETO es 


There is no unique solution to equation (1) 
because the true contribution of, say, the levels of 
factor A are not independent of the constant term €. 
And similarly for other factors. In least squares 
terminology there are more unknowns to be found 
than there are independent normal equations. But 
what can be estimated are the maineffects and inter- 
action effects which were defined above as the dif- 
ferences between the contributions of the factor 
levels. 


This is done by placing certain constraints on 
the values which the terms in equation (1) can take. 
There are two ways of doing this which, although 
essentially the same, provide alternative approaches 
to the arithmetic and to the understanding of the 
results. The first, variance analysis, provides a 
simple way of completing the first stage of the anal- 
ysis referred to on page 11 while at the same time 
yielding the data for the second stage. Dummy vari- 
able regression analysis provides a simple means 
of obtaining estimates of the effects, but only if all 
main and interaction effects are specified in the 
model does it provide all the answers yielded by 
variance analysis. 


Variance Analysis 
If P is the average of all the participation rates 
then with a factorial arrangement the expected value 
of P is p, the true overall mean. And from equation 
(1) it follows that = yer > 
w= €+a+B+y+aB + ay + By + aBy (2) 
Substituting for € in equation (1) gives 
=p+(a,- a +(B-B)+(y+y)+ 


ah oa + ((aBy), - aBy) tee 
or 
Pie = pit A, : B, + Cy + (AB) ., + (AC), + 
(BC), + (ABC)... cae (3) 
where p. = the true mean participation rate 


A. = the true mean participation rate for 
which factor A is at the ith level 
minus the true mean 


B. = 
‘ defined in the same way as for A, 
Cy = 
(AB).. = the true mean participation rate for 
which factor A is at the ith level 
and factor B is at the jth level 
minus (+ A, + B,) 
(AC). 
defined in the same wayas forA B. 
(BC), = eAre 


> 
w 
2 
iI 


the true mean participation rate for 
which factor A is at the ith level, 
B is at the jth level and C is at 
the kth level, minus 


(soa B, mer + (AB), + 
(AC),, + (BC),,) 


It follows from the definitions of the terms given 
above that 


2A, = 2B, - 2C, 
2X(BC), 


- 22(AB),, = ZX(AC) - 
= L2X(ABC), = 


_It was stated above that p is the expected value 
of P; similar expressions’ can also be formed for 
the remaining terms on the right hand side of 
equation (3). 


It will also be stated without proof?® that 


E | st >(P. = P)? | = st ZAt £ (rales 04) 


2 
where o° is the variance of the error term e, 1 The 


expression on the right hand side of equation (4) 
when divided by the appropriate degrees of freedom, 
(r-1), is the expected value of the mean square of 
the deviations of the r true means of factor A about 
the overall true mean, and its estimate is"? 


st 2(P. - BA (r=1) 


Similar expressions can be derived for the mean 
squares of the other main effects and interactions 
and these are summarised in Table 2 below. 


Before discussing a little more of the method- 
ology and interpretation of analysis of variance the 
calculations required in Table 2 have been applied 
to the data in Table 1 and these are given below in 
Table 3. 


From Tables 2 and 3 it can be seen how the 
total sum of squares has been divided between the 
main effects and interactions. It can also be seen 
from the last column of Table 2 that the expectation 
of the mean squares is, in each case, the sum of 
two terms, one of which is the variance of the error 
term. If it was required to provide an independent 
estimate of the error variance then replicate obser- 
varions would be needed in each cell and differences 
between these estimates would provide the required 
variance estimate. But in the absence of replicate 
observations the Significance of the effects can still 
be tested if some a priori assumption is made about 
the presence of the higher order interaction effects. 
For, in the example, if it is assumed that there is 
no ABC interaction then the estimate of 


ZEE(ABC)? 
(1-1) (8-1) (t=) 
Zz 


independent estimate of oa”, 
= 0, and can be used as such.’ 


fiat 


provides an since 


SES (ABC)Y? 


° See O.L. Davies (ed.), The Design and Analysis 
of Industrial Experiments, London, 1954, page 281. 

10 Tbid., page 282, 

1 The reason why the sum of squares X(P. - Pp)’ is 
multiplied by st is because each CF is the mean of st 
observations. 

2 It is also permissable to pool estimates of the 
mean squares of high order interactions to provide an 
estimate of the error variance providing that the one or 
more mean squares selected for this purpose were chosen 
before the data were analysed, i.e. there should be again 
an a priori assumption that the interactions are not 
Significant. 
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TABLE 2. Expectation of Mean Squares in Analysis of Variance 


Source of variation Sum of squares Degrees of Expectation 
freedom of mean square 
Main effects: we 
aa RE. ft NS It Pe 9 a st 2 (P= P)? eal a se +07 
r-l 
tas! abn abe 
24 0 rt een rt &(P, - P)? s-1 eS 
: fea 
ih e> Saas 
OP AIR BGT. i053. Lets i 2 (Ge— Pr t-1 eS et 
pot 
Interactions 
tC EE SA SN t 2=(AB)?. 
Ae eee il! virtue! of t 22 (Pha pasip yp)? (1) (S51) ee eee 
Pinta 37d (Palj(s 22) 
i Abel SEGA s SD(AC)? 
XC... ie eae SP ge PPP) Gees karan by ERA MEY dees ge? 
: : (1) Sh) 
pee r S=(BC)? 
me Meer td 27 0) 27 yee eee ae eer)? (s-1)(t-1) Se el et Oh 
i j (Ss —1) tees 
i ae. SVP (ABe)? 
OS oe ae Pea (PP PoP (r-1)(s-1)(t-1) —_____E ;? 
SS ced a a eae (r-1)(s-1)(t-1) 
P., * Ba? 7 Pt = P) 
Oi a LTZ(P. =P) rst-1 


TABLE 3. Variance Analysis of Labour Force Participation Rates 


Sum of Degrees of Mean Variance 
squares freedom squares ratio! 

Main effects: e 

CAT) SUE eS ape ei Sa ee ce ee 5,560.0 2 2,780.0 Te 

Ey CALL ON MP re re gcc racactiordeihsscescoceansoaseuvostPigenectt coset 2,260.9 D 1s O85 49.4 

Hig CORE MO IMIS DANG Breeton sas stceorbe toc sbodessdeass covsvecszences 7, 860.5 5 Hawi! 68.7 
Second-order interactions: 

GHMIGESEALUSCOUCALTONS accccssccaccses oeteteste et seatetes terse eee 7 4 186.2 8.1 

GAMES EATS 41 COMO sere cess acoscscnvescnsvenssduesscencteee seus eos 642.5 10 64.2 225 

FEA DMO AM COMO: ores cttsssecheediccestseosssden cpassecasseasesees 662.3 10 66.2 2.9 
Third-order interaction: 

Child status /education/income .............00sesccceseees 457.8 20 22.9 

SILC Al NR erent es era ere ccescuacavenesscceenecbeere 18, 188.8 53 
(he 


1 Obtained by dividing the mean squares by the mean square of the third-order interaction. 


= 4 


If the main effects and interactions did not 
exist, i.e. variations in child status, education and 
husbands income had no effect on the wife’s partici- 
pation rate, then all the mean squares would also be 


independent estimates of o*.' It therefore follows 
that if any factor or interaction does affect the level 
of the participation rate then the observed mean 
square will be larger than the estimate of o?. The 
test of significance employed is the standard F test 
for the ratio of two variances. 


In the last column of Table 3 the ratios of mean 
squares of the main effects and second-order inter- 
action effects, to the mean square of the third-order 
interaction (here assumed to be an estimate of og”) 
are given. Tables of the F distribution for the ap- 
propriate degrees of freedom can then be referred to 
to see if the ratio is significant. Table 4 compares 
the observed ratios with those expected from the F' 
distribution with the degrees of freedom shown in 
the table. : 


These ratios show that not only are the main 
effects highly significant, as was to be expected in 
this case, but the interaction of child status with 
education is also significant at the one per cent 
level. In other words the hypothesis that the effect 


13 Ibid., page 256, 


of child status on the wife’s labour force participa- 
rate is the same at different levels of her education 
has to be rejected. The interaction effects of child 
status with income, and of education with income, 
also appears from the table to be significant at the 
five per cent level. For completeness, however, and 
to further illustrate the use of analysis of variance 
in analysing cross classified data, Table 5 below 
contains a complete list of the 54 observations 
showing how they can be built up from the main and 
interaction effects defined by equation (3). 


To illustrate the effect of the child status/ 
education interaction the nine estimates. of 
A+ A, + B, + (AB), , obtained from Table 5 are plot- 
ted on Chart 1 together with the average education, 
(fl + B)), effect. This suggests a reason for the 


Significance of this interaction. When there are 
children under six in the family the effect of the 
wife’s education on the likelihood of her being in 
the labour force is small relative to the importance 
of this factor when there are no children. 


It might appear from the above description of 
the technique of analysis of variance that it pro- 
vides many of the answers to the problems that are 
encountered in the analysis of cross sectional data 
providing that the data can be arranged in the form 
of a ‘‘factorial design’’. If this is so it is because 


TABLE 4. Variance Ratios and F Distribution 


F distribution 


Variance Degrees of level of significance 
Effect ratio! freedom? 
CUR She BUS reais eee ten: yeaa mteases ditwet ava Fe beomiarea renin ces ae ts 2 and 20 
Wile? SreciicatlOnie..c sae Caan etek. Os, «Se ee Manaus. | 49.4 20 
hus band?ssincome™ :: juan eet ee ee 68.7 20 
Childystarus /oduveation «ay. .c: oscar acti ones ae 8.1 20 


eee eee eee eee eee eee Ce eee cere eee eee eee ee 


Educ agony TNC OMe =a. lek ee ce ne me ere Ty 


20 


20 


: Mean Square of effect divided by the mean square of the third-order interaction. 
The appropriate F distribution is defined by the degrees of freedom associated with the two variances. 
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TABLE 5. Contribution of Main Effects and Interaction Effects to Original Observations 


Factor/level combinations 
A. Child status 


B. Education 
C. Husband’s 


income 


Constant 
term 


overall 
mean 


Estimated main effects 


Estimated interaction effects 


Set) 


(BC), 


Original 
observations 


Children under 6: 


S10"O00 androver..@.............. 


Elementary: 
$10,000 and over 
7,000 - $9,999 
5,000- 6,999 
3,000- 4,999 
1,000- 2.999 
Under 1,000 
Secondary: 
$10,000 and over 
7,000 - $9,999 
5,000- 6,999 
3,000- 4,999 
1,000- 2,999 
Under 1,000 
University: 
7,000 - $9,999 
5,000- 6,999 
3,000- 4,999 
1,000- 2,999 
Under 1,000 


No children under 6: 


Elementary: 
$10,000 and over 
7,000 - $9,999 
5,000- 6,999 
3,000- 4,999 
1,000- 2,999 
Under 1,000 
Secondary. 
$10,000 and over 
7,000 - $9,999 
5,000~— 6,999 
3,000- 4,999 
1,000- 2,999 
Under 1,000 
University: 
$10,000 and over 
7,000 - $9,999 
5,000- 6,999 .... 
3,000- 4,999 .... 
1,000- 2,999 .... 
Under 1,000 


No children: 


Elementary: 
$10,000 and over 
7,000 - $9,999 
5,000- 6,999 
3,000- 4,999 
1,000- 2,999 
Under 1,000 

Secondary: 


$10,000 and over 


7,000-$9,999 .... 
5,000- 6,999 .... 
3,000- 4,999 .... 
1,000- 2,999 .... 
Under 1,000 
University: 

$10,000 and over 
7,000 - $9,999 
5,000- 6,999 .... 
3,000- 4,999 .... 
1,000- 2,999 .... 


Under 1,000 


31.42 


+ 3,03 


+ 10.64 
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++t++44 


8.42 
8,42 
8.42 
8.42 
8.42 
8,42 


Ula 
eh 
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7.31 
7.31 
7.31 
7.31 
7.31 
7.31 


0.36 
7.85 
12.42 
11.13 


t+++ti 


195 
12.05 
0,36 
7.85 
12,42 
11.13 


++4+4+11 


LON 
12.05 
0.36 
7.85 
12.42 
11.13 


a a | 


19°71 
12.05 
0.36 
7.85 
12,42 
PSS 


“hh oe SL 
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12,05 
0.36 
7.85 
12,42 
11.13 


See Ay 


19a 
12.05 
0.36 
7.85 
12.42 
11.13 


++e4H1 


AS gpa 
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12,42 
11.13 
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++tet4+H+ 
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be 0 
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+) +441 


Sah ae el 
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3.44 
0.89 
2.49 
1.68 
4,51 
4.07, 


3.44 
0.89 
pals) 
1.68 
4.51 
4.07 


3.44 
0.89 
PSA) 
1,68 
4.51 
4507 


S216 
0.04 
2.93 
1.68 
2.93 
3.42 


4.76 
0.04 
2.53 
1.68 
2.93 
3.42 


4.76 
0. 04 
2.53 
1.68 
2.93 
3,42 


- 6. + 

2 0. ad 

+ 4,65 - 4.37 
+ 2,64 - 0.26 
+ 1.19 - 1,59 
- 1.84 + 4,65 
+ 9,57 - 0.13 
+ Oene + 0.31 
= 3.79 - 1,43 
- 3.74 - 0.36 
= 2:19 - 3.50 
- 0.63 + 5.11 
- 3.01 + 0.73 
- 0.70 - 0.95 
- 0.87 - 2.58 
Hs 10) - 0.64 
+ 0.99 - 1.08 
+ 2.48 + 4,58 
- 6.55 - 0.60 
- 0.07 + 0.65 
+ 4,65 + 4.02 
+ 2.64 + 1.03 
+ Lald + 4,62 
- 1,84 - 9.69 
+ 9,57 + Se TD 
+ 0.79 - 1,72 
- 3.79 - 1.78 
- 3.74 - 1,21 
- 2.19 + 0.92 
- 0.63 - 1,97 
- 3.01 - 2.77 
- 0.70 + 0.34 
- 0.87 + 1.43 
+ a TG + 1.99 
+ 0.99 + 2,12 
+ 2.48 - 3.06 
- 6.55 - 2.99 
- 0.07 + 1,36 
+ 4.65 + 0.36 
+ 2.64 - 0.76 
+ 1-9 - 3.01 
- 1,84 + 5.04 


7.53 
6.08 
10.14 
17.40 
26,33 
23.10 


4.60 
Doig 
13.01 
21.33 
27. 86 
29.81 


6.63 
7.82 
17.03 
27,97 
31.54 
35.71 


13,17 
15. 04 
24,21 
32.31 
38.12 
38,42 


10.19 
21.03 
34.72 
45.61 
52.46 
49.74 


10.53 
28.47 
52.05 
94. 03 
63.57 
36,36 


17. 88 
14,09 
24,35 
31.61 
35,25 
38.98 


14.64 
32.52 
48.34 


CHART-| 


AVERAGE LABOUR FORCE PARTICIPATION RATES OF 
MARRIED WOMEN IN URBAN ONTARIO BY 
EDUCATION ONLY AND BY EDUCATION 
AND CHILD STATUS 


LABOUR FORCE[” 
PARTICIPATION 


RATE 
PER CENT 


50 
NO CHILOREN 


CHILDREN—NONE UNDER 6 


ELEMENTARY SECONDARY UNIVERSITY 


EDUCATIONAL ATTAINMENT 


the model used is really a very simple one and in 
the example only three factors were examined. When 
more factors are included the analysis necessarily 
becomes more involved and the interpretation more 
difficult. With three factors there are three two- 
factor interactions, with four factors there are six 
and with five factors there are ten. There are also 
ten three-factor interactions with five factors com- 
pared, with only the one in the example used in this 
study. But even if it is not proposed to complete the 
analysis in the form described above but rather to 
use the approach described in the next section, it is 
still worthwhile to perform an initial analysis of 
variance when there are a large number of factors. 
This permits an investigation to be made of the way 
in which the total variation (sums of Squares) has 
been made up from the main and interaction effects. 
This is useful because if a factor persistently 
appears in interactions which have large mean 
Squares it might be desirable to control for this 
factor in the subsequent analysis. 


Dummy Variable Regression 


There may be Situations where it is unneces- 
sary to use the full analysis of variance approach. 
For example, it might be assumed, from earlier 
studies, that interaction terms are not likely to be 


4 See Sylvia Ostry, op. cit. 


SA: 


significant. Or it may be that the factor or factors 
which are associated with large interaction effects 
have in some way been controlled for. In these 
cases an alternative approach to the analysis can 
be used.’® | 


Suppose that the interaction terms in equa- 
tion (2) are discarded from the model. The model is 
then 


Da eae eee (5) 
where €, as B;, y,, and ej, are as before. 


Now if a. is replaced by 


where x, has the value of 1 when factor A is at the 
ith level and 0 otherwise, and B, and y, are similarly 
replaced by 


then since pr =35 S,=3, andi t =6, ,equations Goyacad 
be rewritten 


P= € + CBD A CP Sie alla 6 HB sys + Beye + 


ee, Fa V9 29 0 2 gegen eee 


Yee oo Vee ine (6) 
where € = a constant term 
xX, = 1 when child status is at the first 
level 


0 otherwise 


1 when child status isat the second 
level 
0 otherwise 


level 


when education is at the first level 
otherwise 


Or 


X; - { 1 when child status is at the third 
0 otherwise 


** See E. Malinvaud, Statistical Methods of Eco- 
nometrics, Vol. 5 in the series: Studies in Mathematical 
and Managerial Economics, Chicago, 1966, A.S. Goldberger, 
Econometric Theory, New York, 1964, E. Melichar, Least- 
Squares Analysis of Economic Survey Data, Board of 
Governors of the Federal Reserve System, (mimeo), 
J. Johnston, Econometric Methods, New York, 1963. 


Aim 


Fé * r1 
1 | 1 when income is at the first level 
0 otherwise 


Z. = 1 when income is at the sixth level 
0 otherwise 


Now, for reasons mentioned on page 12, thereis 
no unique solution to this equation. But as also 
mentioned earlier, what can be estimated are the 
differences between contributions of the different 
levels of a factor. The constant term can then be 
arbitrarily fixed to facilitate the interpretation of 
the results. A number of approaches to the arith- 
metic have been designed to produce the desired 
results using standard regression analysis program- 
mes. One of these is described below. 


The number of parameters to be estimated can 
be reduced to the number of independent normal 
equations by constraining one coefficient in each 
factor to be equal to zero, say 


eo) = ee = Y» = 0 
so that equation (6) becomes 
PSs a’ Xx, a3 asx, ns Bo + Bids + Ves a 325 + 
Mec pebtiVe A aarti VPage tC (7) 


Equation (7) is now a reparameterised version of the 
model given by equation (6). Each coefficient now 
measures the expected difference between the parti- 
cipation rates of married women associated with that 
level of the factor and of those associated with the 


omitted level. The estimated values, a, b. and Ch 
(i, j and k # 2), of the coefficients and of the con- 
stant term can then be obtained in the normal way. 


However, in order to simplify the presentation it 
will often be found useful to modify the results so 
that the constant term takes on the value of the 
overall mean of the observations. This can easily 
be done as follows: 


For the first factor obtain a value 


where the denominator is the number of levels of that 

factor; then obtain Ke and S for the other factors in 

the same way. It can now be stated that” 
€%k +k +k =P 

In this way the overall mean can be selected as one 

estimate of € in equation (6) with the associated set 


16 See references cited in Footnote (15) and in 
particular E. Melichar, Least-Squares Analysis of Eco- 
nomic Survey Data, op. cit. 


of estimates, a., b. and Ch. of the coefficients in 
equation (6) found using the following rule 


a =a, - kK 
a: = Sik 
a. 7= a aak 
and similarly for the other factors. 


Since xa. > oa XC, = 0 the interpretation 
on the results is that the constant term is the mean 
participation rate and the coefficients the expected 
departures from that mean associated with the dif- 
ferent levels of each factor. 


It only remains to consider briefly the relation- 
ship between the proportion of the total variation 
‘explained’ by the dummy variable regression 
models and the components of the variance analysis 
model given in Table 2 on page 13. 


First, consider again the model postulated as 
equation (6) in which all factor/levels are repre- 
sented by dummy variables. The percentage of the 
total variation explained by each of the factors 
separately is given by the formula for the sum of 
squares attributable to the main effects of each 
factor shown in Table 2. It follows that the total 
explained variance obtained from the regression 
model described by equation (7) is equal to the sum 
of the first three terms in column 2 of Table 2. The 
residual sum of squares obtained from equation (7) 
will, as has been stated earlier, be the sum of 
squares due to the interaction effects or last 4 terms 
in column 2 of Table 2. 


Now consider the situation where one of the 
independent factors in the equation is represented 
by a continuous variable. In this case it is still 
necessary for one of the coefficients among each of 
the remaining sets of the dummy variables to be 
constrained to zero and, if desired, the results 
obtained can still be adjusted in the way described 
above to provide coefficients for all the factor/levels. 
Because the constant term in such a model will not 
become the overall mean after following such a 
procedure, the interpretation of the results will not 
be so simplified.'? However, in this study, where 
both dummy and continuous variables have been 
used, the procedure outlined above has been fol- 
lowed so as to maintain a consistent form of pre- 
sentation. The sum of squares attributable to the 
factor represented by a continuous variable will 
now be lower than if dummy variables had beenused. 
The difference between these two sums of squares 
can then be thought of as a measure of how far the 


17It is interesting to note that the coefficients 
obtained for a set of dummy variables, after they have 
been adjusted in the way proposed earlier, in an equation 
containing at least one continuous variable, are still the 
deviations from the overall mean even though the overall 
mean does not appear in the equation. 


aR 


assumptions, made about the form which the con- 
tinuous variable takes, reflect the free form obtained 
from the use of dummy variables.*® 


Before leaving this section it is necessary to 
mention a point concerning the standard errors of 
the estimates obtained from the regression analysis 
of dummy variables. The coefficients obtained from 
fitting the dummy variable regression model of 
equation (7) are the differences between the contri- 
butions of the factor at the designated levels and 
that at the omitted level. The standard error of the 
coefficients are the standard errors of these differ- 
ences. These standard errors are therefore unchanged 
when the coefficients are deconstrained to the form 


18 In certain cases, uSing the principles of orthogonal 
polynomials (see Davies, op. cit.), it is possible to 
obtain the partitioning of the sum of squares for a factor 
directly from the variance analysis. 


of equation (6) since the differences between the 
coefficients do not change. Furthermore, if the data 
are arranged in the form of a completely balanced 
factorial design, aS is the case in this study, then 
the standard errors obtained will be identical for all 
levels of the same factor and will be the standard 
error of the difference between the coefficients of 
any two levels of that factor.?® 


Table 6 displays the results of fitting the 
dummy variable regression model of equation (6) 
to the data used in this study. It will be noted that 
over 86 per cent of the total variation in labour force 


19 Since the differences between the contributions 
of the different factor levels are the only effects which 
can be estimated the statements made in this paragraph 
are not surprising. They have been included because it 
is a point which appears to have been neglected in the 
literature. 


TABLE 6. Regression Equations of Labour Force Participation Rates of Married Women 
in Urban Ontario 


Coefficients of 


Child status 


Constant 


| Education | Husband’s income 


Model I 
$’000 
R2z=0n9 622 31.421 Children under 6 ~ 13.661 |Elementary ~ 8.421 | 10+ = 19.709 
No =94 No children under 6 3.025 |Secondary Seitr | Vio > 15046 
No children + 10.636 | University Ee oto 5- 7 + 0.357 
3-5 + 7.849 
thet + 12.417 
O- 1 real ba be 81 1b ¢3: 
(Standard error 
of coefficients) .......... (2.516) (2.516) (3.559) 
Model II 
Bre sear 45.377 | Children under 6 - 13.661 |Elementary = 8.421 |Continuous variable 
Neo 


(Standard error 
of coefficients) .......... 


No children under 6 
No children 


+ 3.025 |Secondary 
+ 10.636 | University 


R= Osi on. 43.731 


N =54 


(Standard error 
of coefficients) .......... 


+ 1.107 | $7000 natural scale 


Children under 6 


No children 


" 


. 263 


(2.705) 
| 
Mode] III 
- 13,661 |Elementary - 8.421 | Continuous variable 
No children under 6 + 3.025 |Secondary +0 1 LOTS? 000s0n Log, scale =. Gogo 
+ 10.636 | University + Teo 
C3226) (3.326) (0.845) 
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participation rates is explained by this model which 
would suggest that for practical purposes it would 
meet the needs of most research workers. It was 
noted above that each point estimate is made up by 
the addition of four terms: a constant term which is 
the average of all the observations and one term for 
each factor depending on the level of that factor. 
Thus the estimated labour force participation rate 
of married women in urban Ontario who: 


(1) have children less than 6 years of age: 


(2) have only elementary school education: 

(3) have husbands whose income was’ between 
$5,000 and $7,000 a year: 

is (See Table 6, Model 1) 


31.42 - 13.66 - 8.42 + 0.36 = 9.70 per cent 


which compares very favourably with an actual 
figure of 10.14 per cent. 


Before proceeding, however, it may be worth- 
while to compare the results obtained from this 
dummy variable regression model and those from the 
variance analysis model. First compare the coef- 
ficients given in Table 6 Model I with the contribu- 
tions of the main effects shown in Table 5. That 
they agree is as they should—by definition: but this 
comparison makes it possible to see what effects 
have not been picked up by assuming the relation- 
Ship takes the form given by equation (6). In particu- 
lar the significant interaction effect of child status 
with educational attainment noted on page 14 and 
in Chart 1 is ignored. 


But another repercussion of the failure to 
include significant interaction terms, or to control 
for them, which was referred to earlier, is the effect 
that this has on the significance which can be 
attached to the coefficients. For example, on 
page 12 it was shown that if it were assumed that 
the third-order interaction effects were not signifi- 
cant then the value of the mean square obtained for 
this interaction was an independent estimate of 
the variance of the error term. And from the formula 
for the standard error of the difference between two 
means it can be shown that on this basis a differ- 
ence of 4.7 percentage points is required between 
the participation rates at the different income levels 
before they could be said to be significant at the 
5 per cent level. However, using the standard error 
obtained from pooling the mean squares of all the 
interaction terms participation rates at any two 
income levels must differ by more than 7.2 percent- 
age points before the null hypothesie is rejected. 
It so happens that the increase in the difference 
required for significance, under the two sets of 
assumptions, from 4.7 to 7.2 would not, given the 
data used in this study, change the general con- 
clusions concerning the effect of the income of 
husbands on the wife’s labour force participation 
rate. However, it is clear that this may not always 
be the case. 


Another illustration of the disadvantage of using 
a dummy variable model without interaction terms 
can be provided. If in the example worked out above, 
while holding the other factor/levels constant, the 
husband’s income is changed from $7,000 -$9,999 to 
$10,000 and over, then the new estimated partici- 
pation rate becomes: 


31.42 - 13.66 - 8.42 - 19.71 = -10.37 per cent 


which is not only clearly way off the observed per- 
centage of 7.53 but is obviously an absurd result. 


It is evident therefore, that although wives with 
husbands whose incomes are $10,000 a year and 
over have, on average, a propensity to be in the 
labour force some 20 percentage points lower than 
that of wives with husbands earning between $5,000 
and $6,999 a year, this relationship does not hold 
over all the combinations of the other factor/levels. 
But before considering in some detail how this par- 
ticular problem can be partially taken care of while 
still retaining the basic form of a simple additive 
model, two other forms of this model were tested. 


Child status and, in the absence of a numeric 
scale, education have to be included in the model 
as dummy variables, but this does not apply to the 
income variable. Certainly the use of dummy vari- 
ables to represent the income categories, or levels, 
does allow the resulting coefficients to take a com- 
pletely free form and thus provide better estimates 
of the observed factor/level combinations. However, 
it may be that research workers are interested in 
estimating the participation rates associated with 
other income levels within or outside the range of 
observed values. This can be done, by interpolation 
or, with doubtful justification, extrapolation from the 
coefficients obtained in Model I. Altematively, 
income can be treated as a continuous variable. 
The latter procedure is the one used in this Study. 


The second model included the husband’s 
income as a variable on a linear, or natural scale 
and in the third model the logarithm of the husband’s 
income was used. The reason for using the logarithm 
of the husband’s income will be explained later. 
Because the classification by income was only 
available for those class intervals given earlier, 
the mid-points of these intervals were taken to 
represent the points on a continuous income scale. 
However, as this was not possible for incomes of 
$10,000 and over, other evidence was obtained which 
showed that the average income for this class was 
likely to be close to $16,000 and this figure was 
used, Also, instead of taking logarithms to the base 
10, those to the base 2 were used as a matter of 
convenience (when the incomes were in $’000’s), 
so that the resulting coefficients would be the 
expected change in the participation rates arising 
out of a doubling in the income of the husband. The 
form of these two models, Models II and III, were 


therefore 
®) 
P= - Ox Ok eee oe - 


5 Sp a = We i Ae (8) 


a 


and Pzi€oyhai kh, At, Xnchidg Xs HB ad ye 
Povo Pave ee he (9) 


where the x’s and y’s are as defined above in equa- 
tion (6), and z is the income of the husband. 


The parameters, or coefficients, of Models II 
and III were obtained using the method of least 
squares after one of each of the ‘‘child status’’ and 
‘‘educational attainment’’ levels had been con- 
strained to be equal to zero. And for reasons men- 
tioned earlier the results given in Table 6 for these 
Models are again those calculated after ‘‘decon- 
straining’’ the coefficients by the method described 
above. But, aS indicated, the constant term will not 
now become the overall mean of the participation 
rates. 


Compared with Model I, which explained 86.2 
per cent of the total variation in labour force partici- 
pation rates, Models II and III are rather less effi- 
cient, explaining 82.8 and 73.7 per centrespectively. 
The contributions of child status and education to 
the expected value of the participation rate and 
hence to the variation in participation rate are un- 
changed. It is the removal of the free form from the 
income variables which has -caused the loss of 
efficiency in Models II and III. Perhaps the simplest 
way to show the full effect of the changes in the 
form of the models is to reconstruct the variance 
analysis shown in Table 3, but adding the new- 
found components due to the specified linear form 
of the income variable in Model II and the log form 
in Model III. The results are summarised in Table 7 
below. The point to note is that, although in both 
Model II and III an assumed form has been placed on 
the income variable, the total sum of squares attrib- 


utable to this factor is unchanged. All that has 
happened is that the sum of squares has now been 
partitioned, in the way shown in the table. AS was 
stated at the end of the last section, this provides 
some measure of how far the chosen, or assumed, 
form of the continuous variable approximates to the 
free form provided by Model I. But whereas in the 
variance analysis approach it is possible to retain 
the separate components to see whether the non- 
linear component is, itself, significant, in the regres- 
sion approach used above the non-linear and non 
log-linear components were respectively included 
with the two and three-factor interaction sum of 
squares to make up the composite ‘‘error’’ sum 
of Squares. 


Only a few lines need now be spent on the 
interpretation of the coefficients given in Table 6 
Models II and III. In Model II the coefficient of the 
income factor suggests that the labour force partici- 
pation rate of the wife declines by 2.26 percentage 
points for every $1,000 increase in the husband’s 
income, while from Model III the conclusion would 
be that a doubling of the husband’s income would 
cause a drop of 6.33 percentage points in the wife’s 
participation rate. The estimated labour force parti- 
cipation rates for the same factor/level combinations 
as was calculated for Model I on page 19 are: 


MODEL II 
45.38 - 13.66 - 8.42 - (2.26 x 6) = 9.74 
MODEL III 
43.73 - 13.66 - 8.42 - (6.33 x 2.585) = 5.29 
where 2.585 is the log, 6. 


TABLE 7. Analysis of Variance including Non-linear Components of the Main Effects 


Main effects: 
CHUGESCALUS ete navec aerate cee teeta e mee 
Piducation ti calenrastad «ic allests. oer ia ele ae bee toe 
Income of husband: 
Model II: 
Cane ate COMPONENG) memetencctetetam teeta nae eee te 
(Non-linear component) 
Or Model III: 


(Log-linear component) 


eee rete e ee cee eee eee ee ee ee eee eee eee eee 


ORR e eee meee ee meee ee renee eee ee bese esesseesses 


(Non Tog-linear Component )Pi.certk eee eae tee eet eae 


Mota l= SsMOode tylg es o33.555 here tecives avacdeedaseeteen whe eee 
Residual = interaction effects: 
(Second-order interaction effects) 


Fee eee ee ene ee ener near eens eeesasere 


(Third-order interaction effects) 


eee eee ee eee ee eee ee eee eee eee ee ee eee eee eee ee eee eee eee ee eee ees 


eee eee eee ee eee ee eee Te eee rere Te Teer ee eer ee eee ee ee eee ee Tey 


Sum of Degrees of Mean 
squares freedom squares 
De OO0RO Zoe 
2,260.9 del S105) 
Gel®) ("12 206 22) 
(4) (163.6) 
(1) (558 Seas) 
(4) (668E) 
5 Douce 
(2,049.5) (24) (85.4) 
(457.8) (20) C2229) 
2,507.3 44 57.0 
18,188.8 53 


These estimates obviously look reasonable 
when compared with the actual value of 10.14 per 
cent but they have only been shown here by way of 
illustration and it will be left to the reader to see 
that when the level of the income variable is changed 
the estimated participation rates can become 
negative. 


The last part of this section has shown that 
when all the factor/levels are represented by dummy 
variables the model can produce a ‘‘good fit’’ of 
the data together with providing easily interpreted 
results. This was clearly the case when it was 
applied to the data used in this study. However, 
there are certain shortcomings in the method. Inter- 
action terms are likely to be ignored and estimates 
can easily be obtained which are outside the theo- 


IV. PROBIT AND LOGIT 


Probit Transformation 


The factors which influence the decision of 
individuals to enter the labour force are numerous. 
But it can be assumed that, in the last analysis it 
is on the individual’s (or family’s) reaction to the 
social and economic environment that the ultimate 
decision rests. 


If these environmental pressures are thought of 
as stimuli and if the intensity of a given stimulus 
can be represented on a continuous scale, then the 
probit approach is premised on the assumption that 
each individual in the population has a threshold” 
value of the stimulus such that for a higher value 
he will always respond and for a lower value he will 
never respond. Let the distribution of these threshold 
values in the population be described by a proba- 
bility density function f(x), (-2%¢ <x <oo), The proba- 
bility that a person chosen at random from a 
population has a threshold value in the range 
SR a Sa Ax is given by f(x,) Ax and the proportion 
of the population with a threshold value above x, is 


given by 


P(x.) =f— f(x) dx (10) 
xX 


O° 


If the population is the population of married 
women, the stimulus is the husband’s income and 
the response is withdrawal from (or failure to enter) 
the labour force. An individual’s threshold value, 
then, is the level of the husband’s income above 
which the individual will withdraw from (or will not 
enter) the labour force and equation (10) gives the 
participation rate of married women with husband’s 
income greater than Ke 


20'The term ‘‘threshold’’ was originally used by 
psychologists who can claim to have first used probit 
analysis although they did not recognise its full potential 
at the time. Biometricians who developed the technique 
prefer to use the term ‘*tolerance’’ in biological assay 
work. 
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retical bounds of the variable. Furthermore these 
problems would appear to be aggravated when it is 
wished to include one of the factors as a continuous 
variable. 


The next section will now consider two ways 
in which one of the disadvantages of the models 
discussed in this section, namely the problem of 
constraining the estimates to lie between the ac- 
ceptable range of zero to 100 per cent, can be over- 
come. It will also be shown that these ‘‘transforma- 
tions’’ also reduce the size of theinteraction effects 
thus suggesting that the two main disadvantages of 
the simple additive model mentioned earlier may not 
be independent but simply different aspects of the 
same problem—that of formulating a more concep- 
tually correct labour force participation rate model. 


TRANSFORMATIONS 


The probit technique now assumes that the 
underlying distribution of threshold values is a 
normal distribution N (07) with mean yp and variance 


2 j 
Oo”, or can be made so by some transformation of the 
stimulus (i.e. income variable). 


We then have, for equation (10): 


ees 


oon T/ 
XxX 


3 
role 
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P(x) = 


Oo 


The probit, y, of a proportion P is defined by 
Finney”' as ‘“‘the abscissa corresponding to a proba- 
bility P in a normal distribution with mean 5 and 
variance 1l’’ or, more precisely: 


3 
1 ay 
P= e du (12) 


a ae 


If the right hand sides of equations (11) and (12) 
are now equated it can be shown that: 


x -P 


O 


‘eta 


or more generally: 


Vo = act bx | 
where a and b are constants. 


Thus if the observed proportions are replaced 
by their probits (obtainable from tables) we can fit 
a model which is linear over the range -oo to too by 
the standard techniques: we can then reconvert the 
fitted model to proportions as required. Observe that 
the probits corresponding to proportions of 0 and 1 


21 See D.J. Finney, Probit Analysis: A Statistical 
Treatment of the Sigmoid Response Curve, op. cit. 
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do not exist. Chart 2 illustrates the relationship 
between a percentage or proportion and its probit. 


There is, however, no reason why probit analy- 
sis should be confined to a Single independent 
variable such as the income of the husband. Indeed, 
referring back to equation (5) on page 16, a new 
equation can be postulated in which the probit of 
the labour force participation rate of married women 
is now assumed to be made up of a constant term 
plus a contribution associated with the child status 
of the family, plus a contribution associated with 
the wife’s level of education and a contribution 
which will vary with the income of the husband. 
Three models, using the probit transformation were 
examined: Models IA, IIA, and IIIA which correspond 
in form to the three simple additive models discussed 
in the previous section. In Model IA the income of 
the husband is allowed to take a free form defined 
by dummy variables so that no normalising assump- 
tion is required. In Model IIA income takes a linear 
form which assumes that the distribution of thresh- 
olds is normal along the natural income scale. In 
Model IIIA income is included in a log form which 
assumes that the distribution of thresholds is normal 
along a scale defined by the logarithm of the income. 


It may be appropriate at this stage to consider 
briefly what would, in the example used in this 
Study, be a normalising transformation for the income 
scale. Table 1 gives some clues here. The shape of 
the curve of labour force participation rates along 
the income scale can, in all six cases, be seen to 
be a reverse s-shape. The participation rate is 
fairly stable at a high level over the first three 
income groups, from say, no income to an average 
of $4,000 a year. From this point it falls fairly 
rapidly to the class of $7,000-$9,999 and then the 
rate of decrease in the participation rate slows 
down between this class and the $10,000 and over 
class. Now it has already been mentioned that the 


CHART-2 


open ended class has a probable average income 
value of something approaching $16,000 a year, so 
that it can be seen that for the reverse s-Shape to 
be nearly symmetrical, as it has to be to approxi- 
mate to a normal (reversed) ogive, the ‘‘tail’’ of the 
distribution in the upper income range has to be 
considerably shortened. This can be done quite 
eaSily by taking the logarithm of the income as the 
measurement scale over which the distribution is 
viewed. It was for this reason that the logarithm of 
the income has been introduced into models ex- 
amined in this Study. In addition to the three regres- 
sion analyses a fullanalysis of variance was carried 
out on the transformed variable. The results of the 
analysis of variance are given in Table 8, and that 
of the three regression analyses in Table 9. Com- 
paring the results of the analysis of variance applied 
to the original participation rates given in Table 3, 
With that of their probit transformation, it can be 
seen that the interaction effects which were rela- 
tively large when the simple additive model was 
used have been reduced in size by the use of the 
probit transformation. However, if the three-factor 
interaction mean square is again taken as an esti- 
mate of o” then the two-factor interactions are still 
Significant. 


The amount of variation explained by each of 
the three regressions was in all cases greater than 
when the corresponding model was applied to the 
untransformed data. When income was included as a 
dummy variable the value of R? rose from 0.862 in 
Model I to 0.902 in Model IA so that in this form, 
and using the probit transformation, less than 10 per 
cent of the variation now remains unexplained. With 


income as a linear variable in Model IIA, R? was 
0.864 and when log income was used in Model IIIA 
it was 0.759. In the latter two cases the improve- 
ment in the explained variance was 3.8 and 2.2 per- 
centage points respectively. The reason why log 
income should fail to give a superior result over the 


RELATIONSHIP BETWEEN CUMULATIVE NORMAL DISTRIBUTION 
AND PROBIT TRANSFORMATION 


PROBIT TRANSFORMATION 


CUMULATIVE NORMAL 
LABOUR FORCE PROBIT OF 
PARTICIPATION 


RATE 
PER CENT 


LABOUR FORCE 
PARTICIPATION 
RATE 


80 


VARIABLE SCALE VARIABLE SCALE 


PS nx 


TABLE 8. Analysis of Variance with Probit Transformation 


Sum of Degrees of Mean Variance 
squares freedom squares ratio! 
Main effects: 
EE ere irre sasdneaseneninestecoysaivas Gvasusevevorsvessese Devote 2 Dip 204.78 
"SUGIUKC DUO OVRY, nn hee ae a ee Rea ae ae a 1.7584 2 0. 8792 Gaaae 
HI COMCHOUsINSOATIO™ - Meenas, eet occestteteotecicssns creck 8.7189 5 1.7438 123545 
Second-order interactions: 
ChiuldystatusVentication #200. 28.212... Pewie! 0.4992 4 0.1248 8.83 
era Ne NC nme. ee eo ew 0.3929 10 0. 0393 278 
MeO ANC OUI hw rt cores oh Ache vn cestosVbveciniass, 0.5940 10 0. 0594 4. 20 
Third-order interaction 
Pitta status; cducation/income -”.......%... cnn 0. 2826 20 0.0141 | 
TRGUID| A stebseee acu hile Pa Ts 18. 0330 53 
—— ee ee | = | 


? Obtained by dividing the mean squaresof the main effects and second-order interaction effects by the mean square 
of the third-order interaction. 


TABLE 9. Regression Equations of Probits of Labour Force Participation Rates of Married Women 
in Urban Ontario 


Coefficients of 


Child status Education Husband’s income 


Constant 


Model IA 
| $7000 
R?2=0.0919 4,436 Children under 6 - 0.445 | Elementary ="0.231 | 104 = 20 666 
N =54 No children under 6 + 0.114 |Secondary + 0.021 7-10 - 0.396 
No children + 0.332 | University + 0.209 5- 7 + 0.028 
B15 + 0.267 
1-93 + 0.399 
O- 1 + 0.369 
(Standard error 
of coefficients) .......... (0. 067) (0. 067) (0.095) 
Model IIA 
R?=0.8641 4.902 Children under 6 - 0.445 |Elementary - 0.231 |Continuous variable 
N =954 No children under 6 + 0.114 |Secondary + 0.021 |$’000 natural scale - 0.076 
No children + 0.332 | University + 0.209 
Standard error 
yor seotnaienty Pe. ik (0.075) (0.075) (0. 006) 
Model IIIA 
aa 
R?2=0.7591 4.844 Children under 6 > 0.445 |Elementary - 0.231 |Continuous variable 
Seat No children under6 + (0.114 Secondary + 0.021 |$7000 and Log, seale ~ 0.210 
No children + 0.332 | University + 0.209 
Standard error 
of coefficients) .......... (0.100) (0.100) (0. 025) 
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linear income form is again not difficult to see but 
an explanation of this will be left to the next sec- 
tion. The estimates obtained from the probit regres- 
sion equations were converted back to proportions 
and the differences between these and the original 
observations were used to obtain a measure of how 


much of the variation in the participation rates had 
been explained by the probit models. Table 10 sum- 
marises the results of this exercise and compares 
them with the measures of explained variation ob- 
tained directly from the probit and also from the 
simple additive regression equations. 


TABLE 10. Percentage of Total Variation Explained by Six Regression Models 


Form of dependent variable 


in regression equations 


Proportions? Probits 

Form of dependent variable 

from which variation is ' ‘ 

calculated Proportions? Probits Proportions? 

per cent 
Form of income variable: 

DUuMMiess wees. bi. eee. eee ee a. ee ee 86.2 90.2 90.4 
| pth ges: Wa nN ce Ot UR MyM I is es Sir Ae ARPES coe 8.5 doy 82.6 86.4 Soe 
Logarithm Gas rserscerccotverennrnerrters evens serie eee 1364 hoe9 Udo 


1 Labour force participation rates. 


From this table it can be Seen that the amount 
of the variation in the original participation rates 
which is explained in the participation rates ob- 
tained from the estimated probits is, in the case of 
Model IA and IIA, even greater than the amount of 
variation in the original probits explained by the 
probit regression. But perversely, when the logarithm 
of income was used, the efficiency of the estimating 
equation falls (when the probits are converted back 
to proportions) to even less than that of the equiva- 
lent additive model. 


The results obtained from Model IA will now be 
used to show how the probit transformation works, 
and why on @ priori grounds, it should give results 
superior to those obtained from the equivalent simple 
additive model. On page 19 it was shown that the 
labour force participation rate of married women in 
urban Ontario who had children under six, elementary 
school education, and husbands with an income 
between $5,000 and $6,999 was estimated, from the 
simple additive model using dummies for the income 
variable, at 9.70 per cent compared with an actual 
of 10.14 per cent. But when, keeping the other vari- 
ables unchanged, the husband’s income is in the 
$10,000 and over class the estimated participation 
rate of the wives fell to the absurd figure of -10.37 
per cent compared with an actual of 7.53 per cent. 


Now in order to obtain the estimates for the 
same socio-economic groups obtained from the probit 
transformation in Model IA, it is first necessary to 
calculate the estimate of the appropriate probits. 
From Table 9 these are found to be: 


47436-—02445 —- 0223 1a. 0. 028-3088 
and 
4743.67 nOr 4 a 1082S OL 666 8.094 


which convert to 11.27 per cent and 2.83 per cent 
respectively. Taken together these estimates are 
now much closer to original ones. But more impor- 
tant, from the point of view of an example in the use 
of probits, is the effect that this transformation has 
on the change in the estimated labour force partici- 
pation rate arising from a change in the income 
variable. It will be remembered that the conclusion 
obtained from the simple additive model was that 
the average labour force participation rate of married 
women with husbands earning over $10,000 a year 
was just over 20 percentage points lower than that 
of wives with husbands earning between $5,000 and 
$6,999 a year and that this applies regardless of the 
other social or economic characteristics of the mar- 
ried women. The probit model, on the other hand, 
says that there is a fall of 0.694 in the probits of 
the participation rates between the same income 
groups. And, aS was seen in the above example, 
this only represented a drop of about 8% percentage 
points when the higher estimate was just over 11 per 
cent. But if the higher estimate had been equivalent 
to a participation rate of 50 per cent (equal to a 
probit of 5) then a fall of 0.694 in the probit would 
be equivalent to a drop in the participation rate of 
more than 25 percentage points. 


The next section will now examine an alterna- 
tive transformation which, although doing essentially 
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the same job as the probit, is derived from a com- 
pletely different set of assumptions. A comparison 
of the two transformations will naturally be left to 
the end of the section. 


Logit Transformation 


An alternative approach, then, to the problem 
of non-additivity and that of constraining the esti- 
mated participation rate to keep between the bounds 
of reality, is to consider the variable directly rather 
than the forces which generate the variable as is 
done in the case of the probit. 


If P is again the labour force participation rate 
then it would seem reasonable to assume that the 
ratio of the rate of change of P to the value of P is 
a function of how far P can go before reaching its 
upper limit. If the relationship is assumed to be 
linear and the independent variable is again assumed 
to be the income of the husband (I) we have the 
simple differential equation: 


(13) 
dP/dI_y(1_p) 
ie 
whose solution is: 
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where k represents the constant of integration. 
Rearranging and changing the notation slightly 
equation (14) reduces to: 

1 


The curve described by this formula has been called 
the logistic curve and Berkson, who first used the 
linearising transformation given by the first term in 
equation (14), coined the word logit to describe the 
expression: 


12) 
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Before looking at the use of this logit transfor- 
mation in the models examined in this study, some 
points of interest arise out of the nature of the 
logistic curve. From equation (15) it can be seen 
that as the variable I tends to become a large posi- 
tive number, the denominator, 1 + exp{-(a+blI) }, 
tends to one so that P also tends to one, while as 
I tends to become a large negative number the 
denominator becomes increasingly large and P tends 
to zero. The logit transformation will therefore 
ensure that estimates of P will always lie between 
0 and 1. Moreover, like the probit, the logit trans for- 
mation turns a bounded variable into an unbounded 
one which is linear in form. Thus from equation Gbps 


P 
— -aibl 
log. i-P 


There is one other similarity between the logit 
and the probit. It can be shown that the rate of 
change in P reaches a maximum when P is equal to 
0.5 and that it is symmetrical. The logistic curve is 
therefore very similar to the cumulative normal. They 
are both asymptotic to zero and one, and are sym- 
metrically s-shaped with a maximum gradient when 
Pia0 Foe 


As in the probit model described earlier, the 
expected shape of the logistic curve, when the 
income of the husband is the independent variable 
On a continuous scale, is that of a reverse s-shape 
since the labour force participation rates of married 
women are expected to decline as the income of the 
husband rises. This makes no difference to the fore- 
going discussion since the curve is symmetrical. 
However, to meet the condition the actual form of 
the curve which was estimated is: 


1 
[PM eee 
Lee 
which reduces to: 
1 
P=——+—— 
lene (16) 


The logit transformation is therefore unchanged 
Since the only effect of reversing the shape of the 
curve is that the signs of the parameters are also 
reversed. 


The regression equations of the logits of the 
labour force participation rates took the same three 
forms as those for the simple additive models and 
those based on the probit. 


The coefficients obtained from the results of 
these three regression models are given in Table 11. 
The proportions of the total variation explained were 
all larger than those obtained in the corresponding 
probit models but in no case was the difference more 
than one half of one percentage point. When the 
‘fexplained’’ variation using the logit transformation 
is calculated on the original units, the differences 
between the logit and probit methods are in favour 
of the probit in two out of three models and in favour 
of the logit in the other one. However, the differ- 
ences were again so small as to be negligible. 


The last comparison to be made, then, is be- 
tween the estimates themselves. It is just con- 
ceivable that the allocation of the sum of squares 
could be nearly the same, using the two transfor- 
mations, but with the predicted values for factor/ 
level combinations noticeably different. Here again, 
however, there is no evidence to show that this is 
the case and it must be concluded that the probit 
and the logit transformations do essentially the 
same job. To illustrate this point, the estimated 
labour force participation rates obtained using the 
probit and logit transformations are givenin Table 12 
for the first six and last six observations. 
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TABLE 11. Regression Equations of Logits of Labour Force Participation Rates 
of Married Women in Urban Ontario 


Coefficients of 


Constant Child status Fiducation | Husband’s income 


Model IB 
($7000) 

R?2=0.9067 -— 0.954 | Children under 6 Smee lie, Elementary - 0.377 10+ =» bcli6s 
N =54 No children under 6 + 0. 204 Secondary + 0.029 7-10 - 0.687 
No children + 0.568 University + 0.349 Siseei( | + 0.058 
305 + 0.466 
1- 3 + 0.686 
O- 1 + 0.639 

(Standard error 
of coefficients) ...::...:. (0.112) GOsers 2) (0.159) 

pee 


Model II B 


R?=0.8685 — 0.144 | Children under 6 = 5 te Elementary - 0.377 Continuous variable 
N =54 No children under 6 + 0.204 Secondary + 0.029 {| $’000 natural seale = Ot 
No children + 0.568 University + 0.349 


(Standard error 


OiCOCMICGIENUS) meuereet (0. 127) (Om 27) (0.010) 


Mode] III B 


R?=0.7596 - 0.246 | Children under 6 ag umatee, Elementary - 0. Continuous variable 
N =54 No children under 6 + 0. 204 Secondary + 0.029 $’000 on Log, scale - 0.364 
No children + 0.568 University + 0.349 
(Standard error 
Of COCEICIEHUS) eas: (CORI) 


TABLE 12. Probit and Logit Estimates Compared 


Estimated participation rates 


Child status, education and income of husband 


per cent 


Children under 6: 


Elementary: 
SOS OOORATICROMED eccesrcsscceee cote stra c corte Meee cor ae OMe See tt sete e cron ca nee eee DEE: Seal 
Tk OOO SOO OO Ee, TEs £8 RAMI Chen EER. gan eet) MIE ROG, os, cies Saaeacostentieens 5. 1 5.8 
5000 Fm 6299 OR. certs. emer ire thee FF os RENE Ee asccc ee ee IES Lie 
SRO OO ee 9 Ok ae, seed cae IPE srerpsinart nehrce ine Hiss cetacean ytl ae nee dese ce Oc ce 1645 GS 
MF OO Oa re 9 99 asaneaie Soncrau encima de oicaseetcesceecsnei acai ete lier Gace sista t <a 20.0 1955 
UMC er SOOO sree TE ee ecene a trca cc ace ee cc nie ne acc ee ea a 19S 19.8 
No children: 


University: 
SLO? OOUVa NATO VC Tsetse roa Neck a ee cn, AE eg ee ee 
EQOO= $9299 9 Bee eee er seed Ry tee ee See... | ne 

BLO OO KS. GG OO, Sakae ® caet ie eRe AM chee scctebee a ge eRe we va oss concettaenal ce EE base 
SOOO eA OO9 weet Be. eit Deh icet ated. Sheet. EATEN IEE sn. sanesshsencovasncueeeameas 
TOO OLN 219 GONE Siem tee eae asec ot ai ERR a Miri BAe eR RAE Reo «wd sos von eae ee eee 
WNAEted LOO OO trae. hte. teae, cteee tere emer ce Laat Rt Rie oe 3. ek a Uy 
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The question now arises, given that such a 
transformation is called for, which of two altemative 
transformations to use when it is known that the 
results and their immediate interpretation will be so 
little different. But besides these purely practical 
considerations there is still the fact that the as- 
sumptions on which the two transformations are 
based are fundamentally different, and in the last 
analysis it is in terms of these assumptions that the 
results must be interpreted. This is particularly so 
when the assumptions purport to describe some 
causal relationship whether concerned with physical 
or biological processes or, as is the case in this 
Study, with the physiological or psychological 
makeup of human beings. In his paper, ‘‘Why I Prefer 
Logits to Probits’’ Berkson had this to say: 


“Tf it is seriously believed that there is some 
physical property more or less stably character- 
ising each organism, which determines whether 
or not it succumbs, then it is justifiable to 
advance the hypothesis of a distribution of 
tolerances. In that case one should be prepared 
to suggest the nature of this characteristic so 
that the hypothesis may be capable of corrobo- 
ration by independent experiments. If on the 
other hand the formulation is only that of a 
“mathematical model’’, to guide the method of 
calculation, then it would seem more objective 
and heuristically sounder not to create any 
hypothetica! tolerances, but merely to postulate 


that the proportion of organisms affected fol- 
lows the integrated normal function. I am inter- 
ested in the slope of the dosage mortality line 
as a ‘‘rate’’, of the objectively observed in- 
crease of mortality with increase of dosage, not 
as a Standard deviation of hypothetical toler- 
ances of the animals. I should of course be very 
much interested in the last, if tolerance of the 
animals is what I was observing and studying. 
But we are not dealing with measured toler- 
ances, we are dealing with a dosage mortality 
curve, and when my probitistic friends present 
a standard deviation of tolerances, they may be 
asserting a substantial quantity for the varia- 
bility of something that in fact does not exist 
at all. I once had a teacher of philosophy who 
employed the Socratic method in class. When a 
student gave a simple and especially plausible 
explanation of a very complex social phenom- 
enon the professor said, ‘‘Please, sir, do not 
make up history.’’ I should like to ask mathe- 
matical statisticians when they formulate mathe- 
matical models to please notmake up physics.’’ 


It is for this reason that the present author prefers 
the logit transformation based on the logistic curve. 
Put quite simply it attempts to describe the rela- 
tionship between the effect of some stimulus and 
the stimulus itself without attempting to describe 
the causal process. 


V. GENERALISED LOGISTIC CURVE | 


In the last section it was seen that the use of 
either of two transformations, the probit and the 
logit, generally improved the predictive power of 
the models examined, bearing in mind, of course, 
that the simple additive model had already explained 
86 per cent of the total variation in the participation 
rates. It was also seen that the probit and logit 
transformations were so little different from each 
other in practice if not in concept. However, as has 
been suggested above, the logit transformation is 
based on assumptions which are, perhaps, more 
appropriate to the type of analysis being undertaken, 
and it is for this reason only that the logit, or rather 
the logistic curve has been used for the further 
developments introduced in this section. 


Consider again the participation rate P, of a 
given population. It seems reasonable to assume 
that, instead of having limits of 0 and 1, P will in 
practice have lower and upper limits, L and U, such 
that <aly<'P GUs<0. 


If it is now further assumed that the rate of 
change in P, with respect to I, relative to itS posi- 
tion measured from its lower limit, P-L, is propor- 
tional to the remaining fraction of its total range, 
then the differential equation which describes the 
relationship is: 


gee, Ue (17) 
Dery (U-L) 


This differential equation can now be solved to 
show that: 
U-L 


12S 1G, oe 18 
esd. Fd Ow aks 


where a is a constant. 


However, Since in the example used in this 
Study, the shape of the curve is the reverse of the 
more common pattern—i.e. the participation rate 
declines as the income rises so that the upper 
asymptote is associated with a low income—the 
form of equation required is that of (18) but with L 
and U interchanged, or: 

U-L 


Ege ey eae aera (19) 


Now consider what happens to equation (19) 
when I is assumed to take certain values. As the 
exponent becomes increasingly large the denomi- 
nator of the last term on the right hand side of 
equation (19) approaches 1 so that P approaches L, 
the lower limit. But when the exponent becomes a 
large negative number the same denominator becomes 
very large and the last term in the equation ap- 
proaches zero so that P then approaches U, the 
upper limit. Equation (19) then, which defines a 
generalised version of the logistic curve, has a 
lower limit L and an upper limit U. It can of course 


Pp BR es 


be readily seen that if U is set equal to one and L 
to zero, equation (19) is identical to equation (16). 


It can further be shown that the curve is again 
symmetrically s-shaped with a maximum slope when: 


P =(L+U)/2 


The reasoning behind the foregoing can best be 
illustrated by examining the two graphs in Chart 3, 
below, on each of which are plotted the original 
six observations (one for each income level) for the 
group of married women with children under six 
years of age who have had a university education. 
One graph is drawn with the income of the husband 
on its natural scale and the other with income on a 
logarithmic scale. Also plotted on each graph are 
the estimates obtained from the regression model 
using the logit transformation in which income was 
represented aS a dummy variable. For estimates 
obtained from the other two models—with income 


CHART- 3 


on its natural scale and with the logarithm of the 
income —the fitted curves have been plotted on the 
appropriate graph but drawn so as to go outside the 
range of the original observations. This illustrates 
the way in which they are forced to seek asymptotes 
at 0 and 1 when a continuous scale is used. This, 
of course, does not happen when income is allowed 
to take a free form and is represented in the equation 
by a set of dummy variables. It can also be Seen on 
this chart why the use of the logarithm of the income 
variable did not improve the fit of the equation even 
though, as may be observed, the curve of the original 
data and ‘‘dummy’’ coefficients are clearly not sym- 
metrical on the arithmetic scale. Yet the same ori- 
ginal data plotted against the logarithm of the 
husband’s income indicates that the best fit is still 
likely to be obtained with income included in this 
way, and with a curve which is theoretically sym- 
metrical and s-shaped, but with asymptotes, partic- 
ularly the upper asymptote, lying well inside the 
theoretic bounds of zero and one. 


ACTUAL PARTICIPATION RATES AND ESTIMATES OBTAINED FROM LOGIT REGRESSION MODELS FOR WOMEN WITH 
CHILDREN UNDER 6 YEARS OF AGE AND WITH UNIVERSITY EDUCATION BY THE INCOME OF THE HUSBAND 


LABOUR FORCE 
PARTICIPATION 


LABOUR FORCE 

PARTICIPATION 
RAT 

PER CENT 


RATE 
PER CENT 


80 REGRESSION ESTIMATES WITH INCOME 
ON NATURAL SCALE 


REGRESSION ESTIMATES WITH INCOME 
AS DUMMY VARIABLES 
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When more than one population group is under 
investigation the assumption, implicit in equation 
(19), that U and L are constants is unrealistic. 
Rather it must be assumed that they are functions 
of the factors which define the population groups, 
e.g. education of the wife, and child status. With 
three levels of the child status factor and three 
educational attainment levels there are therefore 
nine pairs of asymptotes to be estimated. But 
with this additional assumption is it also reason- 
able to assume that ‘‘a’’ and ‘‘b’’ in equation (19) 
are also constant between population groups? The 
effect of changing ‘‘a’’ is to change the location of 
the mid-point of the curve along the stimulus (income 
scale) while the effect of changing ‘‘b’’ is to change 
both the mid-point of the curve and its slope. 
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REGRESSION ESTIMATES WITH INCOME 
ON LOGARITHMIC SCALE 


REGRESSION ESTIMATES WITH INCOME 
AS DUMMY VARIABLES 
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The assumption made in this Study is that 
wives with different social characteristics will not 
respond in the same way to the same level of the 
husband’s income, i.e., it is expected that each 
curve will be located at a different point along the 
income scale so that ‘‘a’’ must be allowed to vary. 
But around this point it is assumed that the effect 
on the participation rate of the same proportionate 
change in income will be the same for all population 
groups, i.e. the slope of the curve at the mid-point 
will be the same and therefore ‘‘b’’ can be left as a 
constant. It is implicit in the second assumption 
that the logarithm of the husband’s income will be 
used in the model. 
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In defining the form that the asymptotic values 
take it would be possible to go back to the original 
assumptions made at the beginning of this Study and 
assume that changes in the asymptotes, as the factor 
levels are changed, are additive. This would be the 
simplest thing to do but unless interaction effects 
were included it would in theory permit the asymp- 
totes to go outside the range of 0 and 1. Instead, 
the form of the model employed is one which con- 
strains both the upper and lower set of asymptotes 
to keep within the bounds of 0 and 1 by assuming 
that they also are defined by two multivariable 
logistic curves of the form: 


1 
A, = SS _. (20) 
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1 
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Where iy and se are the set of lower and upper 


asymptotes respectively and where 


X, = ik xX, = 0 when there are no children in 
the family 

X, = 0, X, = 1 when there are children under 
Six in the family 

X, = -l, X, = -l1 when there are children, but 
none under six, in the family 

ioe 1. ‘ees 0 for completed elementary 
schooling or less 

X, = 0, X, = 1 for some high school or com- 
pleted high school 

=x = =r “for some university education 


or with degree 


The reason for the special form in which the 
dummy variables for child status and education are 
expressed is that it was also desirable from the 
point of view of interpretation to be able to view 
the coefficients for each factor as deviations from 
the overall mean. But because the method of esti- 
mation which had to be used was not that of ‘‘non- 
linear least squares’’, the dummies themselves were 
constrained to produce results which could be 
directly interpreted in this way. Thus it is required 
that three coefficients, say, a,, a, and a, are to be 
equated to zero, or: 


eta 0+ a.) =a, 
then the expression incorporating three normal 
dummy variables, say: 
a4, + a4, 4: a,Z, 

becomes: 

aZ, + a2, - (a, + d) Gs 
or: 
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so that the three original dummy variables have been 
reduced to: 


0 Sg ONG Se, 


X, =(Z, - Z;) 


which will take the values indicated above. 


Since no constraints need be placed on the 
expanded form of the constant a in equation (19) 
and if b in that equation is replaced by De then 
equation (19) is: ; 


P=A_- ———— 22) 


where ay and sale are as defined in equations (20) 
and (21) and 


by Lb ot bly x bikxyath ox, 4 by, X,+ b,. II 
Ki aie (23) 


in which I is the logarithm of the income of the 
husband, 


Equation (22), with an error term added, is the 
model whose parameters were estimated from the 
data used in this Study. 


The set of normal equations obtained from 
equation (22) will contain non-linear terms so that 
the classical least squares method cannot be em- 
ployed to provide estimates of the parameters. More- 
over, no linearising or other simple transformation 
is available to assist in the estimation of the para- 
meters. However, techniques are available for the 
estimation of parameters of models whose ‘‘normal 
equations’’ contain non-linear terms and one of 
these’? has been employed to do this. The tech- 
nique is one which requires an initial set of esti- 
mates to be provided and which then uses an itera- 
tive process to successively approximate to the 
least squares Solution. It is, however, well beyond 
the scope of this Study to go into any detail of the 
method. 


Before proceeding to the results obtained using 
this model it should be mentioned that one very real 
problem was encountered in the estimation of the 
parameters. This is a problem which is not uncom- 
mon in the estimation of non-linear models in general 
and one which has been previously met in an exa- 
mination of methods used to estimate the logistic 
function in particular.?* With a complex model the 
surface described by the function of the residual 
sum of squares — which is to be minimised—may be 
so pitted with craters that any set of initial esti- 
mates will cause the iterative process to converge 
on a local minimum rather than the absolute mini- 


22 See D.W. Marguardt, 4n Algorithm for Least- 
Squares Estimation of Nonlinear Parameters, Journal of 
the Society of Industrial Applied Mathematics, Vol. H, 
No. 2, June, 1963. 

23 See F.R. Oliver, Methods of Estimating the 
Logistic Growth Function, Journal of the Royal Statis- 
tical Society, Series C, Applied Statistics, Vol. 13, No, 2. 
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mum. Indeed one set of quite plausible initial esti- 
mates of the parameters of equation (22) gave rise 
to a ‘‘solution’’ which had a ‘‘least squares’’ larger 
than the sum of squares about the mean. 


However. this situation can be dealt with, even 
though it does introduce an element of subjectivity 
to the method. The final estimates to equation (22), 
which will be discussed later, were obtained by 
first providing an initial set o* estimates which were 
then modified by the iterative process employed by 
the method until a ‘“‘least squares’’ solution was 
reached. At the same time the computer was pro- 
grammed to print out not only the standard output 
of the original values, the predicted values and 
the differences, but also the upper and lower asump- 
totes and a measure of the rate of decline, as the 
income rose, from the upper to the lower asymptote. 
From an inspection of this information it was then 
not too difficult to ascertain whether the given 
“least squares’’ solution was associatéd with a 
local minimum (which could be improved upon by 
selecting a new set of initial estimates) or with a 
value at or close to the true minimum which could 
be accepted. This procedure was repeated until a 
‘‘satisfactory’’ solution was found. This is ob- 
viously where subjectivity enters into the process. 
For it does not necessarily follow that values of 
the residual sum of Squares which are very close to 
each other are always obtained from estimates ofthe 
parameters which are themselves sufficiently close 
as to give rise to the same interpretation. Fortu- 


nately this latter problem was not encountered in 
this study, at least not in the region of the very low 
values of the residual sum of Squares. 


What now follows then is a brief description and 
discussion of the results which were obtained based 
on the least ‘‘least squares’’ Solution found. 


AS was to be expected, a very high proportion 
of the variation in the participation rates was 
explained — 93.85 per cent— which, despite the loss 
in the number of degrees of freedom because of the 
increase in the number of parameters in the model, 
gave a standard error term lower than that obtained 
in any other model. 


Turning now to the actual estimate of the para- 
meters these are given in Table 13 together with 
their estimated standard errors. It is important now 
to remember that the parameters other than the 
constant term associated with upper and lower 
asymptotes have been constrained to measure the 
deviations from the average for the given factor/level 
combination. From this Table it can be seen that 
while the estimates of the parameters associated 
with the upper asymptote are all significant’, sup- 
porting the view that the upper limit of the partici- 


pation rates will vary from one socio-economic 


74 Because of possible correlations between the 
parameter estimates the standard t test may give rise to 
an overstatement of the significance of the estimates. 


TABLE 13. Parameter Estimates and Standard Errors in Non-linear Regression Model 


Parameter 


Standard error 


of estimate t-value 


Estimate 
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group to another, there is no such support for a 
similar view that the lower limit is also subject to 
variation. It is beyond the scope of this Study to 
discuss the empirical analysis in detail, but when 
it is remembered that it was assumed the average 
income for the $10,000 and over income group was 
$16,000 and that this average value was assumed 
to obtain for each of the nine groups (not, on reflec- 
tion, a very plausible assumption) it would take a 
much more detailed study to establish whether 
significant variation in the lower asymptote was 
present, or should be assumed to be present. 


The last group of parameters given in Table 13 
determine the ‘‘path between the upper and lower 
asymptote’’. Not all the estimates of these para- 
meters can be judged to be significant, but the same 
qualification made above with regard to the esti- 
mates associated with the upper and lower asymp- 
totes applies; non Significance again means not 
Significantly different from the average.It is perhaps 
interesting to note, at this stage, just what the 
coefficients is Die Dig and bE, do to the path to 
be followed between the asymptotes. The income at 
which the participation rate is exactly half way 
between the upper and lower asymptote is, from 
equation (19), equal to -a/b. But it was later argued 
that when more than one group was being coiusidered, 
with an assumption that their asymptotes would 
vary, that at the same time it should be assumed 
that the hitherto constant term ‘‘a’’ should also be 
allowed to vary. It was for this reason that four 
dummy variables were introduced into this part of 
the expression, to represent the nine sub groups, 
with associated coefficients Daa to Doar The income 
levels at which the participation rate is now half 
way between the upper and lower asymptote for each 
of the groups can now be obtained directly from the 
expression: 
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These income levels, Ly my and the asso- 
ciated upper and lower asymptotes for each of the 
nine child status/education groups, are given in 
Table 14 and the full set of results are graphically 
illustrated in Chart 4. In Chart 4, as an aid to inter- 
pretation, the participation rates have been plotted 
against the natural income scale; the symmetry in 
the curve which would have been present if the 
logarithm of the income had been used has therefore 
been lost. 


Table 14 and Chart 4 illustrate how the opera- 
tion of the coefficients Diy to Daa referred to above, 


is that of a shift mechanism which places the curve 
at some point along the income scale independent 
of the upper and lower limits. Thus it can be seen, 
for example, that although the upper asymptote for 
the participation rate of wives with some children, 
but none under six, and with secondary school edu- 
cation (50.6 per cent) is higher than that of wives 
With children under six but with university educa- 
tion (33.3 per cent) it is estimated that the income 
at which the wife’s participation rate is most sensi- 
tive to a change in the husband’s income is lowest 
($6,670 compared with $7,000) for the former group. 


The question therefore arises as to what eco- 
nomic, aS opposed to mathematical, interpretation 
can be placed on the position of the curve along the 
continuous variable scale? The critical income of 
$6,670 is different, even if by not very much, from 
$7,000, but what does it mean? Similarly, the labour 
force participation rate of wives who have a second- 
ary school education, but have no children, is esti- 
mated to be most sensitive to change, as a result 


TABLE 14. Upper and Lower Asymptotes and Income of Husband at (U+L)/2 


Factor/level, education and child status 
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of a change in her husband’s income, when that 
income is about $8,100 a year. If there are children 
under six in the family, the corresponding income 
level is estimated to be $5,200. It is not sufficient 
to say that the wife with no children is better able 
to combine going out to work with the normal duties 
of running a house, and will therefore still choose 
to do so even when the husband’s income is ata 
fairly high level. For what is being considered is 
the income effect, either direct or indirect: the 
direct effect of being more easily available for work 
outside the home has already been taken care of in 
the positioning of the upper and lower asymptotes. 


Similarly, the reason why the income of the 
husband at which the wife’s labour force partici- 
pation rate is most sensitive to change increases 
as her standard of education increases cannot be 
said to be due directly to the fact that the higher 
her education the greater the job opportunities open 
to her, this effect is again reflected in the upper 
and lower limits placed on her participation rates. 
And moreover, aS has been shown above, the ranking 
of the asymptotes does not necessarily agree with 
the ranking of this critical income value. 


It could further be shown that while a change 
in the position of the curve along the income scale 
will obviously change the participation rate asso- 
ciated with a given income level it will not change 
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the average labour force participation rate for that 
particular child status/educational attainment group. 
And yet the position of the curve is determined by 
the levels of the child status and educational attain- 
ment factors. It can, therefore, be seen that this 
‘‘shift mechanism’’ reflects the indirect effect Which 
these characteristics of the wives have on their 
response, in terms of labour force attachments, toa 
change in their husband’s income. In this sense it 
is operating in a way Similar to that of an inter- 
action effect. But, in addition, it would seem reason- 
able to suggest that the ua L)/2 income value 


which locates the curve along the income scale can 
also be thought of as measure, or index, of the 
wives expectation from employment, defined to 
represent Some vague notion of the expectation 
which the wife has in terms of both job and/or per- 
sonal satisfaction and the family’s economic stand- 
ard of living. So that it can be said that the higher 
this income value the greater is the expectation 
from employment for that group of wives, regardless 
of the proportion of wives in that group who do go 
out to work. 


This, then concludes the section on the model 
based on the generalised logistic curve. What now 
follows, in the final section, is a brief summary of 
the Study as a whole with some tentative conclu- 
sions on the merits and demerits of the methods 
examined. 


VI. SUMMARY 


What, then, has been found out in this Study? 
On page 9 in the Introduction it was stated that 
the purpose of the study was to consider ways in 
which the labour force participation rates of the 
population, cross-classified by a set of demogra- 
phic, social or economic factors, can be related 
functionally to the different levels of those factors. 
The data used in the example were obtained from 
the 1961 Census for 54 groups of married women in 
urban Ontario who were categorised (1) by a child 
status factor which represents both the presence or 
absence of children in the family and the age of any 
children, (2) by the level of their educational attain- 
ment and (3) by the income of their husband. Ten 
models were examined: three which were termed 
simple additive models: three using the probit trans- 
formation; three based on the logit transformation 
and one which was a variant of the generalised 
logistic curves. In addition, Section III examined 
two analytical tools, the analysis of variance and 
dummy variance regression analysis, which are 
used in the examination of the first nine models. 
Section II briefly examined the nature of the labour 
force participation rate as a statistical variable and 
discussed some of the problems encountered in 
its analysis. 


Table 15 now summarises the results with 
regard to the proportion of the variation in the parti- 
cipation rates explained by the 10 models which 


were examined. The first point to note is that in all 
models the explained variance was very high and 
give clear evidence of the significance of the effect 
which the factors examined have on the labour force 
participation rates. This will generally be the case 
in such studies since the subject is now well docu- 
mented and the important factors have been identi- 
fied. The main interest in any study will therefore 
be centred on the values attached to the coefficients 
and in the interpretation which can be placed on the 
results. 


In each case where it applies (all except the 
generalised logistic curve) the proportion of the 
explained variation given in the table is that ob- 
tained without taking note of any interaction effect, 
either by incorporating variables into the model to 
specifically take care of known interactions (see 
page 19)or by partitioning the model. For the simple 
additive type of model without transformation and 
the probit and logit models, it is clear that including 
income as a continuous variable, either on a natural 
or logarithmic scale, causes a Significant, even if 
small, reduction in the explained variation — particu- 
larly so when the logarithm of the income was used. 
Since the generalised logistic curve with income on 
a continuous (logarithmic) scale produced the best 
result in terms of this criterion it would seem that 
the choice of which model to use in practice is 
between: 
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(1) A simple additive type of model with no trans- 
formation of the independent variable and with 
all factor/levels represented by dummy variables, 
but with, in certain circumstances, the means to 
test for, and incorporate, ‘‘interaction’’ effects, 


(2) A model similar to (1) but with the independent 
variable transformed into ‘‘probits’’ (See page 21), 


(3) As for (2) but with a ‘‘logit’’ transformation, 


(4) A model based on the generalised logistic curve 
with income on a continuous scale. 


The first of the four models is undoubtedly the 
simplest to calculate and also the simplest to inter- 
pret (see page 19). The last is by far the most 
difficult to calculate but not necessarily, despite 
the complex form of the equation, the most difficult 
to interpret. It certainly has to be interpreted in a 
different and, perhaps, unusual way by reference, 
for example, to the change in the asymptotes caused 
by a change from one factor level to another; but 
given these constraints it is relatively easy to 
understand. 


The models using the probit and logit trans- 
formation fall between these two extremes in terms 
of the ease with which they can be used, although 
of the two the logit would have to be judged the 
simpler since this transformation can be more easily 
‘‘written in’’ as an option in standard computer 
programmes. However, providing that the number of 


observations is not unduly large, use can be made 


of the tables of the probit transformation which 
partially overcomes the problem. The interpretation 
of the estimated coefficients based on the probit and 
logit transformation, at least in terms of their im- 
mediate effect on the estimated participation rates, 
is virtually identical. 


Turning now to the assumptions made in formu- 
lating the models it is not surprising to find that the 
ease with which the results can be interpreted is 
directly related to the simplicity of the assumptions 
behind the model. Certainly this is so in the case 
of the first model of the four now under considera- 
tion. Results based on the assumptions that the 
change in the labour force participation rate in 
response to a change in one of the factor levels 
is (a) independent of the level of the participation 
rate and (b) independent of the level of the other 
factor are very simple to interpret. But they may 
not always be very meaningful since the assump- 
tions are not very realistic. (See page 16.) In partic- 
ular, it was Shown in Section III that the coefficients 
obtained from analysis of the simple additive model 
could easily result in estimates of labour force 
participation rates, for certain factor/level combina- 
tions, which lie outside the range of 0 and 100 per 
cent. However, it was indicated that if the number 
of factors being examined were few, perhaps no 
more than three or four, then this problem may not 
be so serious as to invalidate the approach com- 
pletely. But this would, in part, depend on the extent 
of any Significant interaction between the effects of 
the different factors. For this reason it was advised 


TABLE 15. Proportion of Variation in Participation Rates Explained by Ten Models 
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‘Figures in brackets denote the proportion of explained variation after the transformed estimates have been con- 


verted back to participation rates. 
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that prior analysis of the data by analysis of vari- 
ance would indicate whether significant interaction 
effects were present which could then be allowed for. 


It was seen that although the assumptions made 
in formulating the probit and logit models are totally 
different they happen in practice to yield essentially 
the same results. The practical advantage of either 
of these two transformations is that they constrain 
the estimates to lie between 0 and 100 per cent and 
in doing so allow for the fact that the change in 
participation rate arising from a change in the level 
of one of the factors is not independent of the par- 
ticipation rate. The difference between the two sets 
of assumptions behind these transformations has 
already been noted in Section IV on page 27, and 
need not be explored in detail here. The use of the 
logit requires no assumption to be made about the 
underlying causal mechanism, if one exists, between 
the wife’s participation in the labour force and the 
socio-economic characteristics of the family. The 
probit transformation, on the other hand, is based 
on the assumption of a specific causal process. 


The reason for modifying the logistic curve to 
provide the equation used in the last model was 
explained in some detail in Section V. It was argued 
that, when one of the variables was a continuous 
variable, it was reasonable to assume that the upper 
and lower limits of the participation rates for a 
particular population group, may not be 100 and zero 
per cent. In every other respect the assumptions 
incorporated in the generalised logistic curve are 
essentially the same as those for the logit model. 


If, for reasons mentioned at the end of Section 
IV, the logit is to be preferred to the probit which 


of the three remaining models should be used in 
practice? 


When all the variables are represented by 
dummies there is clearly no case for using the 
model based on the modified logistic curve. But 
when a continuous variable is included both the 
additive model and that using the logit transforma- 
tion are liable to give poor results unless only a 
few factors are being examined. Similarly there is 
reason to believe that a simple additive model, with 
only dummy variables but with more than, say, three 
factors, will yield inferior estimates unless inter- 
action effects are specifically allowed for. In this 
case the logit transformation may, on a priori rea- 
soning, be expected to yield significantly better 
results, but the data used in this study did not allow 
this view to be tested. 


Which model to use will, therefore, have to be 
determined by the nature of the data and the expected 
ability of each model to represent that data. But 
the decision need not be entirely one of judgement. 
The use of variance analysis will indicate whether 
interaction effects are significant and have to be 
allowed for. And alternative regression models can 
often be fitted in the same computer run so that a 
direct comparison can be made of their ‘‘explana- 
tory power’’. 


This Study has simply attempted to compare 
alternative lines of approach which have been 
found useful in practice and which may be of use 
to others. 
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